Modification of rna-related enzymes for enhanced production

ABSTRACT

The present invention provides, among other things, methods and compositions for large-scale production of capped mRNA using SUMO-Guanylyl Transferase fusion protein.

RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.16/165,372, filed Oct. 19, 2018, which is a continuation of U.S. patentapplication Ser. No. 15/294,249, filed Oct. 14, 2016, now U.S. Pat. No.10,144,942, issued Dec. 4, 2018, which claims priority to U.S.Provisional Application Ser. No. 62/241,350, filed Oct. 14, 2015, thedisclosure of which is hereby incorporated by reference.

SEQUENCE LISTING

The present specification makes reference to a Sequence Listing(submitted electronically as a .txt file named “SL_SHR-1187US” on Oct.14, 2016. The .txt file was generated Oct. 14, 2016 and is 28,402 bytesin size. The entire contents of the Sequence Listing are hereinincorporated by reference.

BACKGROUND

Messenger RNA (“mRNA”) therapy is becoming an increasingly importantapproach for the treatment of a variety of diseases. Effective mRNAtherapy requires effective delivery of the mRNA to the patient andefficient production of the protein encoded by the mRNA within thepatient's body. To optimize mRNA delivery and protein production invivo, a proper cap are typically required at the 5′ end of theconstruct, which protects the mRNA from degradation and facilitatessuccessful protein translation. Therefore, the large-scale production ofenzymes capable of capping mRNA is particularly important for producingmRNA for therapeutic applications.

SUMMARY OF THE INVENTION

The present invention provides improved methods for effective productionof enzymes capable of capping mRNA. The present invention is, in part,based on the surprising discovery that modifying a guanylyl transferase(GT) with a SUMO tag makes it possible to produce GT on the large scaleneeded for producing capped mRNA for therapeutic applications.

Thus, in one aspect, the present invention provides methods of producinga capped RNA or RNA analog oligonucleotide, wherein a fusion proteinfacilitates the steps of transferring and methylating a guanylylmolecule to the 5′ end of the RNA or RNA analog oligonucleotide.

In some embodiments, the fusion protein comprises a guanylyl transferaseand a small ubiquitin-like molecule (SUMO) protein. In some embodiments,the guanylyl transferase comprises SEQ ID NO: 6 and SEQ ID NO: 7 and theSUMO protein comprises SEQ ID NO: 1. In some embodiments, the fusionprotein comprises SEQ ID NO: 8 and SEQ ID NO: 7.

In some embodiments, the one end of the RNA or RNA analogoligonucleotide is the 5′ end.

In some embodiments, the fusion protein has comparable phosphataseactivity, guanylyl transferase activity and methylation activityrelative to a wild-type guanylyl transferase protein.

In another aspect, the present invention provides fusion proteins,wherein a fusion protein comprises guanylyl transferase and a smallubiquitin-like molecule (SUMO) protein.

In some embodiments, the guanylyl transferase comprises SEQ ID NO: 6 andSEQ ID NO: 7 and the SUMO protein comprises SEQ ID NO: 1. In someembodiments, the guanylyl transferase comprises a large subunit and asmall subunit. In some embodiments, the SUMO protein is covalentlylinked and co-expressed with the large subunit. In some embodiments, thefusion protein has comparable phosphatase activity, guanylyl transferaseactivity and methylation activity relative to a wild-type guanylyltransferase protein.

In another aspect, the present invention provides vectors encoding afusion protein comprising guanylyl transferase protein and a smallubiquitin-like molecule (SUMO) protein.

In some embodiments, the vector comprises SEQ ID NO: 5 and SEQ ID NO: 2.In some embodiments, the vector comprises SEQ ID NO: 5, SEQ ID NO: 2,and SEQ ID NO: 3. In some embodiments, the vector comprises SEQ ID NO: 4and SEQ ID NO: 3.

In another aspect, the present invention provides methods to produce aguanylyl transferase by fermentation, comprising: a) culturing in afermentation medium a microorganism that is transformed with at leastone recombinant nucleic acid molecule comprising a nucleic acid sequenceencoding a guanylyl transferase that has an amino acid sequence that isat least 90% identical SEQ ID NO: 6 and SEQ ID NO: 7; and b) collectinga product produced from the step of culturing.

In some embodiments, the guanylyl transferase comprises a guanylyltransferase fusion protein. In some embodiments, the guanylyltransferase fusion protein has comparable phosphatase activity, guanylyltransferase activity and methylation activity relative to a wild-typeguanylyl transferase protein. In some embodiments, the guanylyltransferase fusion protein comprises a small ubiquitin-like molecule(SUMO) protein. In some embodiments, the guanylyl transferase fusionprotein comprises SEQ ID NO: 8.

In some embodiments, the SUMO protein is bound to the guanylyltransferase by a covalent link. In some embodiments, the covalent linkis between the SUMO protein and a large subunit of the guanylyltransferase.

In some embodiments, the fermentation medium is selected from the groupconsisting of Terrific Broth, Cinnabar, 2×YT and LB. In someembodiments, the microorganism is a bacterium.

In some embodiments, the nucleic acid sequence encoding the guanylyltransferase is at least 90% identical to SEQ ID NO: 2 and SEQ ID NO: 3.

In some embodiments, the recombinant nucleic acid molecule furthercomprises a nucleic acid sequence encoding a small ubiquitin-likemolecule (SUMO) protein. In some embodiments, the nucleic acid sequenceencoding a small ubiquitin-like molecule (SUMO) protein is at least 90%identical to SEQ ID NO: 5.

In some embodiments, the product is a guanylyl transferase. In someembodiments, the product is a guanylyl transferase comprises a guanylyltransferase fusion protein. In some embodiments, the guanylyltransferase fusion protein further comprises a small ubiquitin-likemolecule (SUMO) protein.

BRIEF DESCRIPTION OF THE DRAWING

The drawings are for illustration purposes and are in no way limiting.

FIGS. 1A and 1B are diagrams of exemplary mRNA capped structures presentin various embodiments of the invention.

FIG. 2 demonstrates exemplary yield of soluble SUMO-GT protein producedby fermentation compared to that of GT protein produced via the shakeflask method.

DEFINITIONS

In order for the present invention to be more readily understood,certain terms are first defined. Additional definitions for thefollowing terms and other terms are set forth throughout thespecification.

Approximately: As used herein, the term “approximately” or “about,” asapplied to one or more values of interest, refers to a value that issimilar to a stated reference value. In certain embodiments, the term“approximately” or “about” refers to a range of values that fall within25%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%,6%, 5%, 4%, 3%, 2%, 1%, or less in either direction (greater than orless than) of the stated reference value unless otherwise stated orotherwise evident from the context (except where such number wouldexceed 100% of a possible value).

Batch culture: As used herein, the term “batch culture” refers to amethod of culturing cells in which all the components that willultimately be used in culturing the cells, including the medium (seedefinition of “medium” below) as well as the cells themselves, areprovided at the beginning of the culturing process. Thus, a batchculture typically refers to a culture allowed to progress frominoculation to conclusion without refeeding the cultured cells withfresh medium. A batch culture is typically stopped at some point and thecells and/or components in the medium are harvested and optionallypurified.

Biologically active: As used herein, the phrase “biologically active”refers to a characteristic of any substance that has activity in abiological system (e.g., cell culture, organism, etc.). For instance, asubstance that, when administered to an organism, has a biologicaleffect on that organism, is considered to be biologically active.Biological activity can also be determined by in vitro assays (forexample, in vitro enzymatic assays such as sulfate release assays). Inparticular embodiments, where a protein or polypeptide is biologicallyactive, a portion of that protein or polypeptide that shares at leastone biological activity of the protein or polypeptide is typicallyreferred to as a “biologically active” portion. In some embodiments, aprotein is produced and/or purified from a cell culture system, whichdisplays biologically activity when administered to a subject. In someembodiments, a protein requires further processing in order to becomebiologically active. In some embodiments, a protein requiresposttranslational modification such as, but is not limited to,glycosylation (e.g., sialyation), farnysylation, cleavage, folding,formylglycine conversion and combinations thereof, in order to becomebiologically active. In some embodiments, a protein produced as aproform (i.e. immature form), may require additional modification tobecome biologically active.

Bioreactor: As used herein, the term “bioreactor” refers to a vesselused for the growth of a host cell culture. A bioreactor can be of anysize so long as it is useful for the culturing of mammalian cells.Typically, a bioreactor will be at least 1 liter and may be 10, 100,250, 500, 1000, 2500, 5000, 8000, 10,000, 12,0000 liters or more, or anyvolume in between. Internal conditions of a bioreactor, including, butnot limited to pH, osmolarity, CO2 saturation, O2 saturation,temperature and combinations thereof, are typically controlled duringthe culturing period. A bioreactor can be composed of any material thatsuitable for holding cells in media under the culture conditions of thepresent invention, including glass, plastic or metal. In someembodiments, a bioreactor may be used for performing animal cellculture. In some embodiments, a bioreactor may be used for performingmammalian cell culture. In some embodiments, a bioreactor may be usedwith cells and/or cell lines derived from such organisms as, but notlimited to, mammalian cell, insect cells, bacterial cells, yeast cellsand human cells. In some embodiments, a bioreactor is used forlarge-scale cell culture production and is typically at least 100 litersand may be 200, 500, 1000, 2500, 5000, 8000, 10,000, 12,0000 liters ormore, or any volume in between. One of ordinary skill in the art will beaware of and will be able to choose suitable bioreactors for use inpracticing the present invention.

Cell density: As used herein, the term “cell density” refers to thatnumber of cells present in a given volume of medium.

Cell culture or culture: As used herein, these terms refer to a cellpopulation that is gown in a medium under conditions suitable tosurvival and/or growth of the cell population. As will be clear to thoseof ordinary skill in the art, these terms as used herein may refer tothe combination comprising the cell population and the medium in whichthe population is grown.

Cultivation: As used herein, the term “cultivation” or grammaticalequivalents refers to a process of maintaining cells under conditionsfavoring growth or survival. The terms “cultivation” and “cell culture”or any synonyms are used inter-changeably in this application.

Culture vessel: As used herein, the term “culture vessel” refers to anycontainer that can provide an aseptic environment for culturing cells.Exemplary culture vessels include, but are not limited to, glass,plastic, or metal containers.

Expression: As used herein, “expression” of a nucleic acid sequencerefers to one or more of the following events: (1) production of an RNAtemplate from a DNA sequence (e.g., by transcription); (2) processing ofan RNA transcript (e.g., by splicing, editing, 5′ cap formation, and/or3′ end formation); (3) translation of an RNA into a polypeptide orprotein; and/or (4) post-translational modification of a polypeptide orprotein.

Fed-batch culture: As used herein, the term “fed-batch culture” refersto a method of culturing cells in which additional components areprovided to the culture at some time subsequent to the beginning of theculture process. The provided components typically comprise nutritionalsupplements for the cells which have been depleted during the culturingprocess. A fed-batch culture is typically stopped at some point and thecells and/or components in the medium are harvested and optionallypurified.

Homology: As used herein, the term “homology” refers to the overallrelatedness between polymeric molecules, e.g., between nucleic acidmolecules (e.g., DNA molecules and/or RNA molecules) and/or betweenpolypeptide molecules. In some embodiments, polymeric molecules areconsidered to be “homologous” to one another if their sequences are atleast 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%,90%, 95%, or 99% identical. In some embodiments, polymeric molecules areconsidered to be “homologous” to one another if their sequences are atleast 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%,90%, 95%, or 99% similar.

Identity: As used herein, the term “identity” refers to the overallrelatedness between polymeric molecules, e.g., between nucleic acidmolecules (e.g., DNA molecules and/or RNA molecules) and/or betweenpolypeptide molecules. Calculation of the percent identity of twonucleic acid sequences, for example, can be performed by aligning thetwo sequences for optimal comparison purposes (e.g., gaps can beintroduced in one or both of a first and a second nucleic acid sequencesfor optimal alignment and non-identical sequences can be disregarded forcomparison purposes). In certain embodiments, the length of a sequencealigned for comparison purposes is at least 30%, at least 40%, at least50%, at least 60%, at least 70%, at least 80%, at least 90%, at least95%, or substantially 100% of the length of the reference sequence. Thenucleotides at corresponding nucleotide positions are then compared.When a position in the first sequence is occupied by the same nucleotideas the corresponding position in the second sequence, then the moleculesare identical at that position. The percent identity between the twosequences is a function of the number of identical positions shared bythe sequences, taking into account the number of gaps, and the length ofeach gap, which needs to be introduced for optimal alignment of the twosequences. The comparison of sequences and determination of percentidentity between two sequences can be accomplished using a mathematicalalgorithm. For example, the percent identity between two nucleotidesequences can be determined using the algorithm of Meyers and Miller(CABIOS, 1989, 4: 11-17), which has been incorporated into the ALIGNprogram (version 2.0) using a PAM120 weight residue table, a gap lengthpenalty of 12 and a gap penalty of 4. The percent identity between twonucleotide sequences can, alternatively, be determined using the GAPprogram in the GCG software package using an NWSgapdna.CMP matrix.Various other sequence alignment programs are available and can be usedto determine sequence identity such as, for example, Clustal.

Integrated Viable Cell Density: As used herein, the term “integratedviable cell density” refers to the average density of viable cells overthe course of the culture multiplied by the amount of time the culturehas run. Assuming the amount of polypeptide and/or protein produced isproportional to the number of viable cells present over the course ofthe culture, integrated viable cell density is a useful tool forestimating the amount of polypeptide and/or protein produced over thecourse of the culture.

Isolated: As used herein, the term “isolated” refers to a substanceand/or entity that has been (1) separated from at least some of thecomponents with which it was associated when initially produced (whetherin nature and/or in an experimental setting), and/or (2) produced,prepared, and/or manufactured by the hand of man. Isolated substancesand/or entities may be separated from about 10%, about 20%, about 30%,about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, about91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%,about 98%, about 99%, or more than about 99% of the other componentswith which they were initially associated. In some embodiments, isolatedagents are about 80%, about 85%, about 90%, about 91%, about 92%, about93%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99%,or more than about 99% pure. As used herein, a substance is “pure” if itis substantially free of other components. As used herein, calculationof percent purity of isolated substances and/or entities should notinclude excipients (e.g., buffer, solvent, water, etc.)

Medium: As used herein, the term “medium” refer to a solution containingnutrients which nourish growing cells. Typically, these solutionsprovide essential and non-essential amino acids, vitamins, energysources, lipids, and trace elements required by the cell for minimalgrowth and/or survival. The solution may also contain components thatenhance growth and/or survival above the minimal rate, includinghormones and growth factors. In some embodiments, medium is formulatedto a pH and salt concentration optimal for cell survival andproliferation. In some embodiments, medium may be a “chemically definedmedium”—a serum-free media that contains no proteins, hydrolysates orcomponents of unknown composition. In some embodiment, chemicallydefined medium is free of animal-derived components and all componentswithin the medium have a known chemical structure. In some embodiments,medium may be a “serum based medium”—a medium that has been supplementedwith animal derived components such as, but not limited to, fetal calfserum, horse serum, goat serum, donkey serum and/or combinationsthereof.

Nucleic acid: As used herein, the term “nucleic acid,” in its broadestsense, refers to a compound and/or substance that is or can beincorporated into an oligonucleotide chain. In some embodiments, anucleic acid is a compound and/or substance that is or can beincorporated into an oligonucleotide chain via a phosphodiester linkage.In some embodiments, “nucleic acid” refers to individual nucleic acidresidues (e.g., nucleotides and/or nucleosides). In some embodiments,“nucleic acid” refers to an oligonucleotide chain comprising individualnucleic acid residues. As used herein, the terms “oligonucleotide” and“polynucleotide” can be used interchangeably. In some embodiments,“nucleic acid” encompasses RNA as well as single and/or double-strandedDNA and/or cDNA. Furthermore, the terms “nucleic acid,” “DNA,” “RNA,”and/or similar terms include nucleic acid analogs, i.e., analogs havingother than a phosphodiester backbone. For example, the so-called“peptide nucleic acids,” which are known in the art and have peptidebonds instead of phosphodiester bonds in the backbone, are consideredwithin the scope of the present invention. The term “nucleotide sequenceencoding an amino acid sequence” includes all nucleotide sequences thatare degenerate versions of each other and/or encode the same amino acidsequence. Nucleotide sequences that encode proteins and/or RNA mayinclude introns. Nucleic acids can be purified from natural sources,produced using recombinant expression systems and optionally purified,chemically synthesized, etc. Where appropriate, e.g., in the case ofchemically synthesized molecules, nucleic acids can comprise nucleosideanalogs such as analogs having chemically modified bases or sugars,backbone modifications, etc. A nucleic acid sequence is presented in the5′ to 3′ direction unless otherwise indicated. The term “nucleic acidsegment” is used herein to refer to a nucleic acid sequence that is aportion of a longer nucleic acid sequence. In many embodiments, anucleic acid segment comprises at least 3, 4, 5, 6, 7, 8, 9, 10, or moreresidues. In some embodiments, a nucleic acid is or comprises naturalnucleosides (e.g., adenosine, thymidine, guanosine, cytidine, uridine,deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine);nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine,pyrrolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, C-5propynyl-cytidine, C-5 propynyl-uridine, 2-aminoadenosine,C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5-propynyl-uridine,C5-propynyl-cytidine, C5-methylcytidine, 2-aminoadenosine,7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine,O(6)-methylguanine, and 2-thiocytidine); chemically modified bases;biologically modified bases (e.g., methylated bases); intercalatedbases; modified sugars (e.g., 2′-fluororibose, ribose, 2′-deoxyribose,arabinose, and hexose); and/or modified phosphate groups (e.g.,phosphorothioates and 5′-N-phosphoramidite linkages). In someembodiments, the present invention is specifically directed to“unmodified nucleic acids,” meaning nucleic acids (e.g., polynucleotidesand residues, including nucleotides and/or nucleosides) that have notbeen chemically modified in order to facilitate or achieve delivery.

Perfusion process: As used herein, the term “perfusion process” refersto a method of culturing cells in which additional components areprovided continuously or semi-continuously to the culture subsequent tothe beginning of the culture process. The provided components typicallycomprise nutritional supplements for the cells which have been depletedduring the culturing process. A portion of the cells and/or componentsin the medium are typically harvested on a continuous or semi-continuousbasis and are optionally purified. Typically, a cell culture processinvolving a perfusion process is referred to as “perfusion culture.”Typically, nutritional supplements are provided in a fresh medium duringa perfusion process. In some embodiments, a fresh medium may beidentical or similar to the base medium used in the cell cultureprocess. In some embodiments, a fresh medium may be different than thebase medium but containing desired nutritional supplements. In someembodiments, a fresh medium is a chemically-defined medium.

Seeding: As used herein, the term “seeding” refers to the process ofproviding a cell culture to a bioreactor or another vessel for largescale cell culture production. In some embodiments a “seed culture” isused, in which the cells have been propagated in a smaller cell culturevessel, i.e. Tissue-culture flask, Tissue-culture plate, Tissue-cultureroller bottle, etc., prior to seeding. Alternatively, in someembodiments, the cells may have been frozen and thawed immediately priorto providing them to the bioreactor or vessel. The term refers to anynumber of cells, including a single cell.

Subject: As used herein, the term “subject” means any mammal, includinghumans. In certain embodiments of the present invention the subject isan adult, an adolescent or an infant. Also contemplated by the presentinvention are the administration of the pharmaceutical compositionsand/or performance of the methods of treatment in-utero.

Vector: As used herein, “vector” refers to a nucleic acid moleculecapable of transporting another nucleic acid to which it is associated.In some embodiment, vectors are capable of extra-chromosomal replicationand/or expression of nucleic acids to which they are linked in a hostcell such as a eukaryotic and/or prokaryotic cell. Vectors capable ofdirecting the expression of operatively linked genes are referred toherein as “expression vectors.”

Viable cell density: As used herein, the term “viable cell density”refers to the number of living cells per unit volume.

DETAILED DESCRIPTION

The present invention provides, among other things, methods andcompositions for large-scale production of capped mRNA usingSUMO-Guanylyl Transferase fusion protein.

Various aspects of the invention are described in further detail in thefollowing subsections. The use of subsections is not meant to limit theinvention. Each subsection may apply to any aspect of the invention. Inthis application, the use of “or” means “and/or” unless statedotherwise.

SUMO-Guanylyl Transferase Fusion Protein

Small Ubiquitin-Like Modifier (SUMO)

As used herein, a SUMO tag is any protein or a portion of a protein thatcan substitute for at least partial activity of a SUMO protein.

SUMO proteins are small proteins that are covalently attached to anddetached from other proteins in order to modify the functions of thoseproteins. The modification of a protein with a SUMO protein is apost-translational modification involved in various cellular processessuch as nuclear-cytosolic transport, transcriptional regulation,apoptosis, protein stability, response to stress and progression throughthe cell cycle. There are at least 4 SUMO paralogs in vertebrates,designated SUMO-1, SUMO-2, SUMO-3, and SUMO-4. SUMO-2 and SUMO-3 arestructurally and functionally very similar and are distinct from SUMO-1.The amino acid sequence (SEQ ID NO: 1) spans amino acids 3-92 of atypical wild-type or naturally occurring SUMO-3 protein is shown inTable 1. In addition, a codon optimized DNA sequence encoding the SUMO-3protein is also provided in Table 1, as SEQ ID NO: 5.

TABLE 1 Small Ubiquitin-like Modifier SUMO-3 ProteinEEKPKEGVKTENDHINLKVAGQDGSVVQFKIKRHTPLSKLMKAY sequenceCERQGLSMRQIRFRFDGQPINETDTPAQLEMEDEDTIDVFQQQTGG (SEQ ID NO: 1) SUMO-3 DNAGAAGAGAAACCGAAAGAGGGCGTTAAGACCGAGAATGACCAC sequenceATTAACCTGAAGGTCGCTGGTCAAGATGGCAGCGTGGTGCAGTTTAAGATCAAGCGTCACACGCCGTTGAGCAAGCTGATGAAGGCTTACTGCGAGCGTCAGGGTCTGAGCATGCGTCAGATCCGCTTTCGTTTCGATGGCCAGCCGATCAATGAGACTGACACCCCAGCGCA ACTGG (SEQ ID NO: 5)

Thus, in some embodiments, a SUMO protein is a human SUMO-3 protein (SEQID NO: 1). In some embodiments, the SUMO protein may be another SUMOparalog, such as SUMO-1, SUMO-2 or SUMO-4. In some embodiments, asuitable replacement protein may be a homologue or an analogue of humanSUMO-3 protein. For example, a homologue or an analogue of SUMO-3protein may be a modified SUMO-3 protein containing one or more aminoacid substitutions, deletions, and/or insertions as compared to awild-type or naturally-occurring SUMO-3 protein (e.g. SEQ ID NO: 1),while retaining substantial SUMO-3 protein activity. Thus, in someembodiments, an enzyme suitable for the present invention issubstantially homologous to a wild-type or naturally-occurring SUMO-3protein (SEQ ID NO: 1). In some embodiments, an enzyme suitable for thepresent invention has an amino acid sequence at least 50%, 55%, 60%,65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%,99% or more identical to SEQ ID NO: 1. In some embodiments, an enzymesuitable for the present invention is substantially identical to awild-type or naturally-occurring SUMO-3 protein (SEQ ID NO: 1). In someembodiments, a protein suitable for the present invention contains afragment or a portion of a SUMO protein. In some embodiments, the SUMOprotein comprises human SUMO-1, human SUMO-2, human SUMO-3, any one ofArabidopsis Zhalania SUMO-1 through SUMO-8, tomato SUMO, any one ofXenopus laevis SUMO-1 through SUMO-3, Drosophila melanogasler Smt3,Caenorhabdilis elegans SMO-1, Schizosaccharomyces pombe Pmt3, malarialparasite Plasmodium falciparum SUMO, mold Aspergillus nidulans SUMO, anequivalent thereof, a homologue thereof, or a combination thereof.

In some embodiments, the SUMO protein is encoded by a nucleic acidderived from an organism selected from the group consisting of human,mouse, insect, plant, yeast, and other eukaryotic organisms. In someembodiments, the SUMO protein is encoded by a nucleic acid derived froman organism selected from the group consisting of Homo sapiens,Arabidopsis Zhalania, tomato, Xenopus laevis, Drosophila melanogasler,Caenorhabdilis elegans, Schizosaccharomyces pombe, Plasmodiumfalciparum, or Aspergillus nidulans. In some embodiments, a nucleicacid suitable for the present invention has an sequence at least 50%,55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%,97%, 98%, 99% or more identical to SEQ ID NO: 5. In some embodiments, anucleic acid suitable for the present invention is substantiallyidentical to a nucleic acid encoding a wild-type or naturally-occurringSUMO-3 protein (SEQ ID NO: 5).

Guanylyl Transferase (GT)

As used herein, a GT protein is any protein or portion of a protein thatcan substitute for at least partial activity of naturally-occurringGuanylyl Transferase (GT) protein. As used herein, the terms “a GTprotein” and “a GT enzyme” and grammatical equivalents are usedinterchangeably.

GT is an enzyme derived from the Vaccinia Virus system that facilitatesthe transfer and methylation of a guanylyl molecule to the 5′ end of amessenger RNA molecule. This process, known as mRNA capping, is highlyregulated and important for the creation of stable and mature mRNA ableto undergo translation during protein synthesis. The GT enzyme comprisesa heterodimer that includes a “large subunit” (D1, about 97 kDa) and a“small subunit (D12, about 33 kDa). GT provides three enzymaticfunctions: phosphatase activity (cleavage of the nascent 5′ triphosphateof mRNA to a diphosphate), guanylyl transferase activity (incorporationof a GTP molecule to the 5′ end of the mRNA moiety) and methylationactivity (incorporation of a methyl group at the N⁷ position of theguanylyl base). The amino acid sequence of the large subunit (SEQ ID NO:6) and small subunit (SEQ ID NO: 7) of a typical wild-type or naturallyoccurring GT protein are shown in Table 2. In addition, codon optimizedDNA sequences encoding the large and small subunits of GT are alsoprovided in Table 2, as SEQ ID NO: 2 and SEQ ID NO: 3, respectively.

TABLE 2 Guanylyl Transferase Large subunitMDANVVSSSTIATYIDALAKNASELEQRSTAYEINNELELVFIKPPL (Protein sequence)ITLTNVVNISTIQESFIRFTVTNKEGVKIRTKIPLSKVHGLDVKNVQLVDAIDNIVWEKKSLVTENRLHKECLLRLSTEERHIFLDYKKYGSSIRLELVNLIQAKTKNFTIDFKLKYFLGSGAQSKSSLLHAINHPKSRPNTSLEIEFTPRDNETVPYDELIKELTTLSRHIFMASPENVILSPPINAPIKTFMLPKQDIVGLDLENLYAVTKTDGIPITIRVTSNGLYCYFTHLGYIIRYPVKRIIDSEVVVFGEAVKDKNWTVYLIKLIEPVNAINDRLEESKYVESKLVDICDRIVFKSKKYEGPFTTTSEVVDMLSTYLPKQPEGVILFYSKGPKSNIDFKIKKENTIDQTANVVFRYMSSEPIIFGESSIFVEYKKFSNDKGFPKEYGSGKIVLYNGVNYLNNIYCLEYINTHNEVGIKSVVVPIKFIAEFLVNGEILKPRIDKTMKYINSEDYYGNQHNIIVEHLRDQSIKIGDIFNEDKLSDVGHQYANNDKFRLNPEVSYFTNKRTRGPLGILSNYVKTLLISMYCSKTFLDDSNKRKVLAIDFGNGADLEKYFYGEIALLVATDPDADAIARGNERYNKLNSGIKTKYYKFDYIQETIRSDTFVSSVREVFYFGKFNIIDWQFAIHYSFHPRHYATVMNNLSELTASGGKVLITTMDGDKLSKLTDKKTFIIHKNLPSSENYMSVEKIADDRIVVYNPSTMSTPMTEYIIKKNDIVRVFNEYGFVLVDNVDFATIIERSKKFINGASTMEDRPSTRNFFELNRGAIKCEGLDVEDLLSYYVVY VFSKR (SEQ ID NO: 6)Small subunit MDEIVKNIREGTHVLLPFYETLPELNLSLGKSPLPSLEYGANYFLQI(Protein sequence) SRVNDLNRMPTDMLKLFTHDIMLPESDLDKVYEILKINSVKYYGRSTKADAVVADLSARNKLFKRERDAIKSNNHLTENNLYISDYKMLTFDVFRPLFDFVNEKYCIIKLPTLFGRGVIDTMRIYCSLFKNVRLLKCVSDSWLKDSAIMVASDVCKKNLDLFMSHVKSVTKSSSWKDVNSVQFSILNNPVDTEFINKFLEFSNRVYEALYYVHSLLYSSMTSDSKSIENKHQRRLVKLLL (SEQ ID NO: 7) Large subunitAGATGGAAGATGAAGATACCATCGACGTCTTTCAGCAACAGAC (DNA sequence)CGGTGGTATGGATGCTAACGTCGTTAGCAGCAGCACCATTGCGACTTACATTGATGCACTGGCCAAAAACGCATCTGAGCTTGAGCAGCGCAGCACCGCCTACGAGATCAATAACGAATTGGAGCTGGTTTTCATTAAACCGCCGCTGATCACGCTGACGAACGTCGTGAACATTAGCACGATTCAAGAGAGCTTTATTCGTTTCACCGTTACCAATAAAGAAGGCGTGAAGATCCGTACCAAGATTCCGCTGAGCAAAGTGCATGGTCTGGACGTGAAAAATGTGCAGCTGGTTGATGCGATCGATAACATCGTGTGGGAGAAGAAATCTTTGGTCACGGAAAATCGTCTGCACAAGGAATGTCTGCTGCGTCTGTCAACCGAAGAACGCCACATCTTCCTGGACTACAAGAAGTATGGTTCCAGCATCCGTCTGGAACTGGTGAACCTGATTCAGGCAAAGACCAAGAACTTCACCATTGACTTCAAACTGAAGTATTTCCTGGGCTCTGGTGCACAGAGCAAATCCAGCTTGTTGCACGCGATTAACCATCCGAAGAGCCGTCCGAATACGAGCCTGGAGATCGAATTCACGCCGCGTGATAACGAAACCGTTCCGTACGATGAGCTGATTAAAGAACTGACGACGTTGAGCCGCCACATCTTTATGGCCAGCCCGGAAAACGTGATCCTTAGCCCGCCTATCAATGCGCCGATTAAAACCTTTATGTTACCGAAACAAGACATTGTGGGTCTGGACCTGGAAAACCTGTACGCGGTCACCAAAACGGACGGCATTCCGATCACGATTCGTGTTACCAGCAATGGTCTGTACTGCTATTTCACTCATTTGGGCTATATCATTCGTTATCCGGTGAAACGCATCATTGATTCTGAGGTTGTCGTTTTCGGCGAAGCAGTCAAGGACAAGAATTGGACTGTGTACCTGATCAAATTGATTGAACCGGTTAACGCCATCAATGACCGCCTGGAAGAGTCGAAATATGTTGAAAGCAAACTGGTGGATATTTGTGATCGTATCGTGTTCAAGAGCAAGAAATATGAAGGCCCGTTCACCACGACCAGCGAAGTTGTTGACATGCTGAGCACCTATCTGCCGAAACAACCTGAGGGTGTGATTCTGTTTTACTCCAAGGGTCCGAAGAGCAACATTGATTTCAAAATCAAGAAAGAGAATACCATTGATCAGACCGCCAACGTTGTGTTCCGCTATATGTCCAGCGAGCCTATCATTTTCGGTGAGTCGAGCATCTTTGTTGAATACAAAAAGTTTAGCAACGATAAGGGTTTTCCGAAAGAATACGGTTCCGGTAAGATTGTGTTGTACAACGGCGTCAATTATCTGAACAACATCTACTGTCTGGAGTACATCAATACCCATAACGAAGTTGGCATTAAGTCTGTTGTCGTCCCGATCAAATTCATCGCGGAGTTCCTGGTTAACGGTGAGATTCTGAAGCCGCGTATTGATAAAACTATGAAATACATTAACTCCGAAGATTACTACGGTAATCAGCATAACATCATCGTCGAGCACTTGCGTGATCAAAGCATTAAGATCGGTGACATCTTTAACGAAGATAAGCTGAGCGATGTAGGCCACCAGTATGCGAACAATGACAAATTTCGCCTGAATCCGGAAGTCAGCTACTTTACGAATAAGCGCACCCGTGGTCCACTGGGTATCCTGAGCAATTATGTTAAAACCCTGTTGATTTCCATGTACTGCTCCAAAACGTTCCTGGACGACAGCAACAAGCGCAAAGTTCTGGCGATCGACTTCGGTAATGGTGCCGATCTGGAGAAGTACTTTTATGGTGAGATCGCATTGCTGGTTGCTACCGACCCGGATGCAGATGCGATCGCCCGTGGCAACGAGCGTTACAATAAGCTGAATAGCGGTATCAAGACCAAATACTACAAATTCGACTATATTCAAGAGACGATCCGCTCGGACACCTTTGTATCCAGCGTGCGTGAGGTGTTTTACTTCGGTAAATTCAACATCATTGACTGGCAATTCGCCATTCACTATAGCTTTCACCCACGCCACTATGCGACGGTCATGAACAACCTGTCTGAGCTGACCGCGAGCGGCGGTAAAGTTCTGATCACCACGATGGACGGTGACAAGCTGTCTAAACTGACCGACAAAAAGACCTTCATTATTCACAAAAATCTCCCGTCGAGCGAGAATTACATGTCCGTCGAAAAGATTGCGGACGACCGTATTGTTGTCTACAACCCGAGCACTATGTCGACCCCAATGACCGAGTATATCATCAAAAAGAATGACATTGTGCGTGTCTTTAATGAATACGGTTTTGTGCTGGTCGACAACGTCGATTTTGCGACCATCATCGAGAGAAGCAAGAAATTCATTAATGGCGCTTCTACGATGGAAGATCGCCCGAGCACGCGTAACTTCTTTGAGCTGAATCGTGGCGCGATTAAGTGCGAGGGCCTGGACGTCGAGGATCTGCTGTCGTATTACGTGGTTTATGTGTTTAGCAAACGTTAATGA (SEQ ID NO: 2) Small subunitATGGACGAAATTGTCAAGAATATCCGTGAAGGTACCCACGTTT (DNA sequence)TACTGCCATTCTACGAGACGCTGCCGGAACTGAACCTGAGCCTGGGTAAAAGCCCTCTGCCGAGCCTGGAGTATGGTGCGAACTATTTTCTGCAGATTTCCCGTGTAAACGATTTGAACCGCATGCCGACGGACATGCTGAAACTGTTCACCCACGACATCATGCTGCCGGAATCTGATCTGGATAAAGTTTACGAGATCTTGAAAATCAATTCAGTGAAGTACTATGGCCGTAGCACCAAGGCCGATGCGGTGGTCGCAGACCTGAGCGCGCGTAACAAACTGTTTAAACGTGAACGTGACGCAATTAAGAGCAATAACCATCTGACCGAGAACAATTTGTACATCAGCGACTACAAGATGTTGACTTTTGACGTGTTTCGTCCGCTGTTCGACTTTGTTAATGAGAAATACTGCATTATCAAGCTGCCGACGTTGTTTGGTCGCGGCGTCATTGATACGATGCGCATTTACTGCTCTCTCTTCAAGAATGTGCGCCTGCTGAAGTGTGTCTCCGACAGCTGGCTGAAAGATAGCGCTATTATGGTTGCGAGCGACGTGTGTAAAAAGAACCTGGATCTGTTCATGAGCCACGTGAAGAGCGTTACCAAAAGCAGCAGCTGGAAAGACGTTAACAGCGTCCAGTTCTCCATTCTGAATAACCCGGTCGATACCGAGTTTATCAACAAGTTCCTTGAATTCAGCAATCGCGTTTATGAGGCCCTGTATTACGTTCATAGCCTGCTGTATAGCTCCATGACCTCTGATAGCAAATCGATCGAGAATAAACACCAACGTCGTCTGGTGAAACTGCTGCTGTAATGA (SEQ ID NO: 3)

Thus, in some embodiments, a GT enzyme is a heterodimer comprising largeand small subunits (SEQ ID NO: 6 and SEQ ID NO: 7, respectively). Insome embodiments, the GT enzyme of the invention may be a homologue oranalogue of one or the other of the GT large and small subunits. Forexample, a homologue or analogue of GT protein may be a modified GTprotein containing one or more amino acid substitutions, deletions,and/or insertions as compared to SEQ ID NO: 6 and/or SEQ ID NO: 7, whileretaining substantial GT protein activity. Thus, in some embodiments, anenzyme suitable for the present invention is substantially homologous tothe GT protein large and small subunits (SEQ ID NO: 6 and SEQ ID NO: 7).In some embodiments, an enzyme suitable for the present invention has anamino acid sequence at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%,90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more identical toSEQ ID NO: 6. In some embodiments, an enzyme suitable for the presentinvention has an amino acid sequence at least 50%, 55%, 60%, 65%, 70%,75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or moreidentical to SEQ ID NO: 7. In some embodiments, an enzyme suitable forthe present invention is substantially identical to the large and smallsubunits of GT (SEQ ID NO: 6 and SEQ ID NO: 7). In some embodiments, anenzyme suitable for the present invention contains a fragment or aportion of a GT protein.

In some embodiments, the GT protein is encoded by a nucleic acid derivedfrom an virus selected from the group consisting of Vaccinia virus,Rabbitpox virus, Cowpox virus, Taterapox virus, Monkeypox virus, Variolamajor virus, Camelpox virus, Ectromelia virus, Variola minor virus,Orthopox virus, Raccoonpox virus, Skunkpox virus, Volepox virus, Yokapox virus, Swinepox virus, Yaba monkey tumor virus, Deerpox virus,Myxoma virus, Tanapox virus, Goatpox virus, Rabbit fibroma virus, Lumpyskin disease virus, Sheeppox virus, Eptesipox virus, Squirrelpox virus,Molluscum contagiosum virus, Cotia virus, Orf virus, Bovine popularstomatitis virus, Pseudocowpox virus, Canarypox virus, Pidgeonpox virus,Penguinpox virus, and Fowlpox virus. In some embodiments, nucleic acidssuitable for the present invention have a sequence at least 50%, 55%,60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%,98%, 99% or more identical to SEQ ID NO: 2. In some embodiments, nucleicacids suitable for the present invention have a sequence at least 50%,55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%,97%, 98%, 99% or more identical to SEQ ID NO: 3. In some embodiments,nucleic acids suitable for the present invention are substantiallyidentical to a nucleic acid encoding a GT protein (SEQ ID NO: 2 and SEQID NO: 3).

SUMO-GT Fusion

As used herein, a SUMO-GT fusion protein is any protein or portion of aprotein that comprises a SUMO protein covalently linked to a GuanylylTransferase (GT) protein, wherein the fusion protein can substitute forat least partial activity of naturally-occurring Guanylyl Transferase(GT) protein. As used herein, the terms “a SUMO-GT fusion protein” and“a SUMO-GT fusion enzyme” and grammatical equivalents are usedinterchangeably. An exemplary amino acid sequence of the fusion of SUMOand the GT large subunit (SEQ ID NO: 8) are shown in Table 3. Inaddition, an exemplary DNA sequence encoding the fusion of SUMO and theGT large subunit is also provided in Table 3, as SEQ ID NO: 4.

TABLE 3 SUMO-GT Fusion SUMO-GT largeATGGGCCATCATCATCACCATCACGGCAGCCTGCAAGAAGAGA subunit DNAAACCGAAAGAGGGCGTTAAGACCGAGAATGACCACATTAACCT construct with HisGAAGGTCGCTGGTCAAGATGGCAGCGTGGTGCAGTTTAAGATC tag and linkerAAGCGTCACACGCCGTTGAGCAAGCTGATGAAGGCTTACTGCGAGCGTCAGGGTCTGAGCATGCGTCAGATCCGCTTTCGTTTCGATGGCCAGCCGATCAATGAGACTGACACCCCAGCGCAACTGGAGATGGAAGATGAAGATACCATCGACGTCTTTCAGCAACAGACCGGTGGTATGGATGCTAACGTCGTTAGCAGCAGCACCATTGCGACTTACATTGATGCACTGGCCAAAAACGCATCTGAGCTTGAGCAGCGCAGCACCGCCTACGAGATCAATAACGAATTGGAGCTGGTTTTCATTAAACCGCCGCTGATCACGCTGACGAACGTCGTGAACATTAGCACGATTCAAGAGAGCTTTATTCGTTTCACCGTTACCAATAAAGAAGGCGTGAAGATCCGTACCAAGATTCCGCTGAGCAAAGTGCATGGTCTGGACGTGAAAAATGTGCAGCTGGTTGATGCGATCGATAACATCGTGTGGGAGAAGAAATCTTTGGTCACGGAAAATCGTCTGCACAAGGAATGTCTGCTGCGTCTGTCAACCGAAGAACGCCACATCTTCCTGGACTACAAGAAGTATGGTTCCAGCATCCGTCTGGAACTGGTGAACCTGATTCAGGCAAAGACCAAGAACTTCACCATTGACTTCAAACTGAAGTATTTCCTGGGCTCTGGTGCACAGAGCAAATCCAGCTTGTTGCACGCGATTAACCATCCGAAGAGCCGTCCGAATACGAGCCTGGAGATCGAATTCACGCCGCGTGATAACGAAACCGTTCCGTACGATGAGCTGATTAAAGAACTGACGACGTTGAGCCGCCACATCTTTATGGCCAGCCCGGAAAACGTGATCCTTAGCCCGCCTATCAATGCGCCGATTAAAACCTTTATGTTACCGAAACAAGACATTGTGGGTCTGGACCTGGAAAACCTGTACGCGGTCACCAAAACGGACGGCATTCCGATCACGATTCGTGTTACCAGCAATGGTCTGTACTGCTATTTCACTCATTTGGGCTATATCATTCGTTATCCGGTGAAACGCATCATTGATTCTGAGGTTGTCGTTTTCGGCGAAGCAGTCAAGGACAAGAATTGGACTGTGTACCTGATCAAATTGATTGAACCGGTTAACGCCATCAATGACCGCCTGGAAGAGTCGAAATATGTTGAAAGCAAACTGGTGGATATTTGTGATCGTATCGTGTTCAAGAGCAAGAAATATGAAGGCCCGTTCACCACGACCAGCGAAGTTGTTGACATGCTGAGCACCTATCTGCCGAAACAACCTGAGGGTGTGATTCTGTTTTACTCCAAGGGTCCGAAGAGCAACATTGATTTCAAAATCAAGAAAGAGAATACCATTGATCAGACCGCCAACGTTGTGTTCCGCTATATGTCCAGCGAGCCTATCATTTTCGGTGAGTCGAGCATCTTTGTTGAATACAAAAAGTTTAGCAACGATAAGGGTTTTCCGAAAGAATACGGTTCCGGTAAGATTGTGTTGTACAACGGCGTCAATTATCTGAACAACATCTACTGTCTGGAGTACATCAATACCCATAACGAAGTTGGCATTAAGTCTGTTGTCGTCCCGATCAAATTCATCGCGGAGTTCCTGGTTAACGGTGAGATTCTGAAGCCGCGTATTGATAAAACTATGAAATACATTAACTCCGAAGATTACTACGGTAATCAGCATAACATCATCGTCGAGCACTTGCGTGATCAAAGCATTAAGATCGGTGACATCTTTAACGAAGATAAGCTGAGCGATGTAGGCCACCAGTATGCGAACAATGACAAATTTCGCCTGAATCCGGAAGTCAGCTACTTTACGAATAAGCGCACCCGTGGTCCACTGGGTATCCTGAGCAATTATGTTAAAACCCTGTTGATTTCCATGTACTGCTCCAAAACGTTCCTGGACGACAGCAACAAGCGCAAAGTTCTGGCGATCGACTTCGGTAATGGTGCCGATCTGGAGAAGTACTTTTATGGTGAGATCGCATTGCTGGTTGCTACCGACCCGGATGCAGATGCGATCGCCCGTGGCAACGAGCGTTACAATAAGCTGAATAGCGGTATCAAGACCAAATACTACAAATTCGACTATATTCAAGAGACGATCCGCTCGGACACCTTTGTATCCAGCGTGCGTGAGGTGTTTTACTTCGGTAAATTCAACATCATTGACTGGCAATTCGCCATTCACTATAGCTTTCACCCACGCCACTATGCGACGGTCATGAACAACCTGTCTGAGCTGACCGCGAGCGGCGGTAAAGTTCTGATCACCACGATGGACGGTGACAAGCTGTCTAAACTGACCGACAAAAAGACCTTCATTATTCACAAAAATCTCCCGTCGAGCGAGAATTACATGTCCGTCGAAAAGATTGCGGACGACCGTATTGTTGTCTACAACCCGAGCACTATGTCGACCCCAATGACCGAGTATATCATCAAAAAGAATGACATTGTGCGTGTCTTTAATGAATACGGTTTTGTGCTGGTCGACAACGTCGATTTTGCGACCATCATCGAGAGAAGCAAGAAATTCATTAATGGCGCTTCTACGATGGAAGATCGCCCGAGCACGCGTAACTTCTTTGAGCTGAATCGTGGCGCGATTAAGTGCGAGGGCCTGGACGTCGAGGATCTGCTGTCGTATTACGTGGTTTATGTGTTTAGCAAACGTTAATGA (SEQ ID NO: 4) SUMO-GT largeMGHHHHHHGSLQEEKPKEGVKTENDHINLKVAGQDGSVVQFKIK subunit proteinRHTPLSKLMKAYCERQGLSMRQIRFRFDGQPINETDTPAQLEMED with His tag andEDTIDVFQQQTGGMDANVVSSSTIATYIDALAKNASELEQRSTAY linkerEINNELELVFIKPPLITLTNVVNISTIQESFIRFTVTNKEGVKIRTKIPLSKVHGLDVKNVQLVDAIDNIVWEKKSLVTENRLHKECLLRLSTEERHIFLDYKKYGSSIRLELVNLIQAKTKNFTIDFKLKYFLGSGAQSKSSLLHAINHPKSRPNTSLEIEFTPRDNETVPYDELIKELTTLSRHIFMASPENVILSPPINAPIKTFMLPKQDIVGLDLENLYAVTKTDGIPITIRVTSNGLYCYFTHLGYIIRYPVKRIIDSEVVVFGEAVKDKNWTVYLIKLIEPVNAINDRLEESKYVESKLVDICDRIVFKSKKYEGPFTTTSEVVDMLSTYLPKQPEGVILFYSKGPKSNIDFKIKKENTIDQTANVVFRYMSSEPIIFGESSIFVEYKKFSNDKGFPKEYGSGKIVLYNGVNYLNNIYCLEYINTHNEVGIKSVVVPIKFIAEFLVNGEILKPRIDKTMKYINSEDYYGNQHNIIVEHLRDQSIKIGDIFNEDKLSDVGHQYANNDKFRLNPEVSYFTNKRTRGPLGILSNYVKTLLISMYCSKTFLDDSNKRKVLAIDFGNGADLEKYFYGEIALLVATDPDADAIARGNERYNKLNSGIKTKYYKFDYIQETIRSDTFVSSVREVFYFGKFNIIDWQFAIHYSFHPRHYATVMNNLSELTASGGKVLITTMDGDKLSKLTDKKTFIIHKNLPSSENYMSVEKIADDRIVVYNPSTMSTPMTEYIIKKNDIVRVFNEYGFVLVDNVDFATIIERSKKFINGASTMEDRPSTRNFFELNRGAIKCEGLDVEDLLSYYVVYVFSKR (SEQ ID NO: 8)

In some embodiments, the SUMO-GT fusion protein comprises SEQ ID NO: 8.In some embodiments, the SUMO-GT fusion protein is a heterodimercomprising SEQ ID NO: 8 and SEQ ID NO: 7. In some embodiments, the GTenzyme of the invention may be a homologue or analogue of one or theother of the GT large and small subunits. For example, a homologue oranalogue of the SUMO-GT fusion protein may be a modified SUMO-GT fusionprotein containing one or more amino acid substitutions, deletions,and/or insertions as compared to SEQ ID NO: 8 and/or SEQ ID NO: 7, whileretaining substantial GT protein activity. Thus, in some embodiments, aSUMO-GT fusion protein suitable for the present invention issubstantially homologous to the heterodimer comprising the GT smallsubunit (SEQ ID NO: 7) and the fusion of SUMO and the GT large subunit(SEQ ID NO: 8). In some embodiments, an enzyme suitable for the presentinvention has an amino acid sequence at least 50%, 55%, 60%, 65%, 70%,75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or moreidentical to SEQ ID NO: 8 and SEQ ID NO: 7. In some embodiments, anenzyme suitable for the present invention is substantially identical tothe heterodimer comprising the GT small subunit (SEQ ID NO: 7) and thefusion of SUMO and the GT large subunit (SEQ ID NO: 8). In someembodiments, an enzyme suitable for the present invention contains afragment or a portion of a GT protein covalently bound to a SUMOprotein.

Production of SUMO-GT Fusion Protein

Host Cells

As used herein, the term “host cells” refers to cells that can be usedto produce a SUMO-GT fusion protein. In particular, host cells aresuitable for producing a SUMO-GT fusion protein at a large scale. Insome embodiments, host cells are able to produce SUMO-GT fusion proteinin an amount of or greater than about 5 picogram/cell/day (e.g., greaterthan about 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80,85, 90, 95, or 100 picogram/cell/day). In some embodiments, host cellsare able to produce SUMO-GT fusion protein in an amount ranging fromabout 5-100 picogram/cell/day (e.g., about 5-90 picogram/cell/day, about5-80 picogram/cell/day, about 5-70 picogram/cell/day, about 5-60picogram/cell/day, about 5-50 picogram/cell/day, about 5-40picogram/cell/day, about 5-30 picogram/cell/day, about 10-90picogram/cell/day, about 10-80 picogram/cell/day, about 10-70picogram/cell/day, about 10-60 picogram/cell/day, about 10-50picogram/cell/day, about 10-40 picogram/cell/day, about 10-30picogram/cell/day, about 20-90 picogram/cell/day, about 20-80picogram/cell/day, about 20-70 picogram/cell/day, about 20-60picogram/cell/day, about 20-50 picogram/cell/day, about 20-40picogram/cell/day, about 20-30 picogram/cell/day).

Suitable host cells can be derived from a variety of organisms,including, but not limited to, bacteria, yeast, insects, plants, birds(e.g., avian systems), amphibians, and mammals. In some embodiments,host cells are non-mammalian cells. Non-limiting examples ofnon-mammalian host cells suitable for the present invention includecells and cell lines derived from Escherichia coli, Salmonellatyphimurium, Bacillus subtilis, Bacillus lichenifonnis, Bacteroidesfragilis, Clostridia perfringens, Clostridia difficile for bacteria;Pichia pastoris, Pichia methanolica, Pichia angusta,Schizosacccharomyces pombe, Saccharomyces cerevisiae, and Yarrowialipolytica for yeast; Sodoptera frugiperda, Trichoplusis ni, Drosophilamelangoster and Manduca sexta for insects; and and Xenopus Laevis fromamphibian.

In some embodiments, host cells are mammalian cells. Any mammalian cellsusceptible to cell culture, and to expression of polypeptides, may beutilized in accordance with the present invention as a host cell.Non-limiting examples of mammalian cells that may be used in accordancewith the present invention include human embryonic kidney 293 cells(HEK293), HeLa cells; BALB/c mouse myeloma line (NSO/l, ECACC No:85110503); human retinoblasts (PER.C6 (CruCell, Leiden, TheNetherlands)); monkey kidney CV1 line transformed by SV40 (COS-7, ATCCCRL 1651); human fibrosarcomacell line (e.g., HT-1080); human embryonickidney line (293 or 293 cells subcloned for growth in suspensionculture, Graham et al., J. Gen Virol., 36:59 (1977)); baby hamsterkidney cells (BHK, ATCC CCL 10); Chinese hamster ovary cells +/−DHFR(CHO, Urlaub and Chasin, Proc. Natl. Acad. Sci. USA, 77:4216 (1980));mouse sertoli cells (TM4, Mather, Biol. Reprod., 23:243-251 (1980));monkey kidney cells (CV1 ATCC CCL 70); African green monkey kidney cells(VERO-76, ATCC CRL-1 587); human cervical carcinoma cells (HeLa, ATCCCCL 2); canine kidney cells (MDCK, ATCC CCL 34); buffalo rat liver cells(BRL 3A, ATCC CRL 1442); human lung cells (W138, ATCC CCL 75); humanliver cells (Hep G2, HB 8065); mouse mammary tumor (MMT 060562, ATCCCCL51); TRI cells (Mather et al., Annals N.Y. Acad. Sci., 383:44-68(1982)); MRC 5 cells; FS4 cells; a human hepatoma line (Hep G2), humancell line CAP and AGELHN, and Glycotope's panel.

Additionally, any number of available hybridoma cell lines may beutilized in accordance with the present invention. One skilled in theart will appreciate that hybridoma cell lines might have differentnutrition requirements and/or might require different culture conditionsfor optimal growth and polypeptide or protein expression, and will beable to modify conditions as needed.

Expression Vectors

Various nucleic acid constructs can be used to express SUMO-GT fusionprotein described herein in host cells. A suitable vector constructtypically includes, in addition to SUMO-GT fusion protein-encodingsequences (also referred to as SUMO-GT fusion transgene), regulatorysequences, gene control sequences, promoters, non-coding sequencesand/or other appropriate sequences for expression of the protein and,optionally, for replication of the construct. Typically, the codingregion is operably linked with one or more of these nucleic acidcomponents.

“Regulatory sequences” typically refer to nucleotide sequences locatedupstream (5′ non-coding sequences), within, or downstream (3′ non-codingsequences) of a coding sequence, and which influence the transcription,RNA processing or stability, or translation of the associated codingsequence. Regulatory sequences may include promoters, enhancers, 5′untranslated sequences, translation leader sequences, introns, and 3′untranslated sequences such as polyadenylation recognition sequences.Sometimes, “regulatory sequences” are also referred to as “gene controlsequences.”

“Promoter” typically refers to a nucleotide sequence capable ofcontrolling the expression of a coding sequence or functional RNA. Ingeneral, a coding sequence is located 3′ to a promoter sequence. Thepromoter sequence consists of proximal and more distal upstreamelements, the latter elements often referred to as enhancers.Accordingly, an “enhancer” is a nucleotide sequence that can stimulatepromoter activity and may be an innate element of the promoter or aheterologous element inserted to enhance the level or tissue-specificityof a promoter. Promoters may be derived in their entirety from a nativegene, or be composed of different elements derived from differentpromoters found in nature, or even comprise synthetic nucleotidesegments. It is understood by those skilled in the art that differentpromoters may direct the expression of a gene in different tissues orcell types, or at different stages of development, or in response todifferent environmental conditions.

The “3′ non-coding sequences” typically refer to nucleotide sequenceslocated downstream of a coding sequence and include polyadenylationrecognition sequences and other sequences encoding regulatory signalscapable of affecting mRNA processing or gene expression. Thepolyadenylation signal is usually characterized by affecting theaddition of polyadenylic acid tracts to the 3′ end of the mRNAprecursor.

The “translation leader sequence” or “5′ non-coding sequences” typicallyrefers to a nucleotide sequence located between the promoter sequence ofa gene and the coding sequence. The translation leader sequence ispresent in the fully processed mRNA upstream of the translation startsequence. The translation leader sequence may affect processing of theprimary transcript to mRNA, mRNA stability or translation efficiency.

Typically, the term “operatively linked” refers to the association oftwo or more nucleic acid fragments on a single nucleic acid fragment sothat the function of one is affected by the other. For example, apromoter is operatively linked with a coding sequence when it is capableof affecting the expression of that coding sequence (i.e., that thecoding sequence is under the transcriptional control of the promoter).Coding sequences can be operatively linked to regulatory sequences insense or antisense orientation.

The coding region of a transgene may include one or more silentmutations to optimize codon usage for a particular cell type. Forexample, the codons of an SUMO-GT fusion transgene may be optimized forexpression in a bacterial cell. In some embodiments, the codons of anSUMO-GT fusion transgene may be optimized for expression in an E. colicell. In some embodiments, the codons of an SUMO-GT fusion transgene maybe optimized for expression in a mammalian cell. In some embodiments,the codons of an SUMO-GT fusion transgene may be optimized forexpression in a human cell.

Optionally, a construct may contain additional components such as one ormore of the following: a splice site, an enhancer sequence, a selectablemarker gene under the control of an appropriate promoter, an amplifiablemarker gene under the control of an appropriate promoter, and a matrixattachment region (MAR) or other element known in the art that enhancesexpression of the region where it is inserted.

Once transfected or transduced into host cells, a suitable vector canexpress extrachromosomally (episomally) or integrate into the hostcell's genome.

In some embodiments, a DNA construct that integrates into the cell'sgenome, it need include only the transgene nucleic acid sequences. Inthat case, the express of the transgene is typically controlled by theregulatory sequences at the integration site. Optionally, it can includeadditional various regulatory sequences described herein.

Culture Medium and Conditions

The term “medium” and “culture medium” as used herein refers to ageneral class of solution containing nutrients suitable for maintainingand/or growing cells in vitro. Typically, medium solutions provide,without limitation, essential and nonessential amino acids, vitamins,energy sources, lipids, and trace elements required by the cell for atleast minimal growth and/or survival. In other embodiments, the mediummay contain an amino acid(s) derived from any source or method known inthe art, including, but not limited to, an amino acid(s) derived eitherfrom single amino acid addition(s) or from a peptone or proteinhydrolysate addition(s) (including animal or plant source(s)). Vitaminssuch as, but not limited to, Biotin, Pantothenate, Choline Chloride,Folic Acid, Myo-Inositol, Niacinamide, Pyridoxine, Riboflavin, VitaminB12, Thiamine, Putrescine and/or combinations thereof. Salts such as,but not limited to, CaCl₂, KCl, MgCl₂, NaCl, Sodium Phosphate Monobasic,Sodium Phosphate Dibasic, Sodium Selenite, CuSO₄, ZnCl₂ and/orcombinations thereof. Fatty acids such as, but not limited to,Arachidonic Acid, Linoleic Acid, Oleic Acid, Lauric Acid, Myristic Acid,as well as Methyl-beta-Cyclodextrin and/or combinations thereof). Insome embodiments, medium comprises additional components such asglucose, glutamine, Na-pyruvate, insulin or ethanolamine, a protectiveagent such as Pluronic F68. In some embodiments, the medium may alsocontain components that enhance growth and/or survival above the minimalrate, including hormones and growth factors. Medium may also compriseone or more buffering agents. The buffering agents may be designedand/or selected to maintain the culture at a particular pH (e.g., aphysiological pH, (e.g., pH 6.8 to pH 7.4)). A variety of bufferssuitable for culturing cells are known in the art and may be used in themethods. Suitable buffers (e.g., bicarbonate buffers, HEPES buffer,Good's buffers, etc.) are those that have the capacity and efficiencyfor maintaining physiological pH despite changes in carbon dioxideconcentration associated with cellular respiration. The solution ispreferably formulated to a pH and salt concentration optimal for cellsurvival and proliferation.

In some embodiments, medium may be a chemically defined medium. As usedherein, the term “chemically-defined nutrient medium” refers to a mediumof which substantially all of the chemical components are known. In someembodiments, a chemically defined nutrient medium is free ofanimal-derived components. In some cases, a chemically-defined mediumcomprises one or more proteins (e.g., protein growth factors orcytokines.) In some cases, a chemically-defined nutrient mediumcomprises one or more protein hydrolysates. In other cases, achemically-defined nutrient medium is a protein-free media, i.e., aserum-free media that contains no proteins, hydrolysates or componentsof unknown composition.

Typically, a chemically defined medium can be prepared by combiningvarious individual components such as, for example, essential andnonessential amino acids, vitamins, energy sources, lipids, salts,buffering agents, and trace elements, at predetermined weight or molarpercentages or ratios. Exemplary serum-free, in particular,chemically-defined media are described in US Pub. No. 2006/0148074, thedisclosure of which is hereby incorporated by reference.

In some embodiments, a chemically defined medium suitable for thepresent invention is a commercially available medium such as, but notlimited to, Terrific Broth, Cinnabar, 2×YT or LB. In some embodiments, achemically defined medium suitable for the present invention is amixture of one or more commercially available chemically definedmediums. In various embodiments, a suitable medium is a mixture of two,three, four, five, six, seven, eight, nine, ten, or more commerciallyavailable chemically defined media. In some embodiments, each individualcommercially available chemically defined medium (e.g., such as thosedescribed herein) constitutes, by weight, 1%, 2.5%, 5%, 7.5%, 10%,12.5%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%,80%, 85%, 90%, 95%, or more, of the mixture. Ratios between eachindividual component medium may be determined by relative weightpercentage present in the mixture. In some embodiments, proteinexpression is increased with the addition of IPTG to repress thepromoter.

In some embodiments, a chemically defined medium may be supplemented byone or more animal derived components. Such animal derived componentsinclude, but are not limited to, fetal calf serum, horse serum, goatserum, donkey serum, human serum, and serum derived proteins such asalbumins (e.g., bovine serum albumin or human serum albumin).

The present invention provides a method of producing SUMO-GT fusionprotein at a large scale. Typical large-scale procedures for producing afusion polypeptide of interest include batch cultures and fed-batchcultures. Batch culture processes traditionally comprise inoculating alarge-scale production culture with a seed culture of a particular celldensity, growing the cells under conditions (e.g., suitable culturemedium, pH, and temperature) conducive to cell growth, viability, and/orproductivity, harvesting the culture when the cells reach a specifiedcell density, and purifying the expressed polypeptide. Fed-batch cultureprocedures include an additional step or steps of supplementing thebatch culture with nutrients and other components that are consumedduring the growth of the cells. In some embodiments, a large-scaleproduction method according to the present invention uses a fed-batchculture system.

Purification of Expressed SUMO-GT Fusion Protein

Various methods may be used to purify or isolate SUMO-GT fusion proteinproduced according to various methods described herein. In someembodiments, the expressed SUMO-GT fusion protein is secreted into themedium and thus cells and other solids may be removed, as bycentrifugation or filtering for example, as a first step in thepurification process. Alternatively or additionally, the expressedSUMO-GT fusion protein is bound to the surface of the host cell. In thisembodiment, the host cells (for example, bacterials cells) expressingthe polypeptide or protein are lysed for purification. Lysis of hostcells (e.g., bacterials cells) can be achieved by any number of meanswell known to those of ordinary skill in the art, including physicaldisruption by glass beads and exposure to high pH conditions.

The SUMO-GT fusion protein may be isolated and purified by standardmethods including, but not limited to, chromatography (e.g., ionexchange, affinity, size exclusion, and hydroxyapatite chromatography),gel filtration, centrifugation, or differential solubility, ethanolprecipitation or by any other available technique for the purificationof proteins (See, e.g., Scopes, Protein Purification Principles andPractice 2nd Edition, Springer-Verlag, New York, 1987; Higgins, S. J.and Hames, B. D. (eds.), Protein Expression: A Practical Approach,Oxford Univ Press, 1999; and Deutscher, M. P., Simon, M. I., Abelson, J.N. (eds.), Guide to Protein Purification: Methods in Enzymology (Methodsin Enzymology Series, Vol 182), Academic Press, 1997, all incorporatedherein by reference). For immunoaffinity chromatography in particular,the protein may be isolated by binding it to an affinity columncomprising antibodies that were raised against that protein and wereaffixed to a stationary support. Protease inhibitors such as phenylmethyl sulfonyl fluoride (PMSF), leupeptin, pepstatin or aprotinin maybe added at any or all stages in order to reduce or eliminatedegradation of the polypeptide or protein during the purificationprocess. Protease inhibitors are particularly desired when cells must belysed in order to isolate and purify the expressed polypeptide orprotein.

Solubility

Various methods may be used to determine the solubility of a protein inan expression system. In an exemplary method, bacteria are spun down andresuspended in a mild lysis buffer containing 1% IGEPAL and proteaseinhibitors. Lysis is supported by repeated freezing and thawing thebacteria. Soluble and insoluble fraction are separated bycentrifigation. To determine the total amount of recombinant protein thesame volume of bacterial culture is spun down and lysed in the sameamount of lysis buffer containing 1% IGEPAL and 0.1% SDS. Soluble andtotal protein are analyzed by SDS-PAGE, with western blotting ifnecessary. In some embodiments, the expression system is E. coli. Insome embodiments, solubility of GT is improved when it has been producedas a fusion protein. In some embodiments, the fusion protein is aSUMO-GT fusion protein. In some embodiments, the SUMO-GT fusion proteinhas increased solubility compared to the non-fusion GT protein. In someembodiments, the increased solubility of the SUMO-GT fusion proteincompared to the non-fusion GT protein is observed during shake flaskproduction of the SUMO-GT fusion protein. In some embodiments, theincreased solubility of the SUMO-GT fusion protein compared to thenon-fusion GT protein is observed during fermentation production of theSUMO-GT fusion protein.

Use of SUMO-GT Fusion in mRNA Capping

Production of Capped mRNAs

According to the present invention, a SUMO-GT fusion protein describedherein may be used to produce capped mRNAs by in vitro transcription.Various in vitro transcription assays are available in the art and canbe used to practice the present invention. For example, in vitrotranscription was originally developed by Krieg and Melton (METHODSENZYMOL., 1987, 155: 397-415) for the synthesis of RNA using an RNAphage polymerase. Typically these reactions include at least a phage RNApolymerase (T7, T3 or SP6), a DNA template containing a phage polymerasepromoter, nucleotides (ATP, CTP, GTP and UTP), and a buffer containing amagnesium salt. RNA synthesis yields may be optimized by increasingnucleotide concentrations, adjusting magnesium concentrations and byincluding inorganic pyrophosphatase (U.S. Pat. No. 5,256,555; Gurevich,et al., ANAL. BIOCHEM. 195: 207-213 (1991); Sampson, J. R. andUhlenbeck, O. C., PROC. NATL. ACAD. SCI. USA. 85, 1033-1037 (1988);Wyatt, J. R., et al., BIOTECHNIQUES, 11: 764-769 (1991)). The RNAsynthesized in these reactions is usually characterized by a 5′ terminalnucleotide that has a triphosphate at the 5′ position of the ribose.Typically, depending on the RNA polymerase and promoter combinationused, this nucleotide is a guanosine, although it can be an adenosine(see e.g., Coleman, T. M., et al., NUCLEIC ACIDS RES., 32: e14 (2004)).In these reactions, all four nucleotides are typically included atequimolar concentrations and none of them is limiting.

Some embodiment of the invention are batch reactions—that is, allcomponents are combined and then incubated at about 37° C. to promotethe polymerization of the RNA until the reaction terminates. Typically,a batch reaction is used for convenience and to obtain as much RNA asneeded from such reactions for their experiments. In some embodiments, a“fed-batch” system (see, e.g., JEFFREY A. KERN, BATCH AND FED-BATCHSTRATEGIES FOR LARGE-SCALE PRODUCTION OF RNA BY IN VITRO TRANSACTION(University of Colorado) (1997)) is used to increase the efficiency ofthe in vitro transcription reaction. All components are combined, butthen additional amounts of some of the reagents are added over time,such as the nucleotides and magnesium, to try to maintain constantreaction conditions. In addition, in some embodiments, the pH of thereaction may be held at 7.4 by monitoring it over time and adding KOH asneeded.

To synthesize a capped RNA by in vitro transcription, a cap analog(e.g., N-7 methyl GpppG; i.e., m⁷GpppG) is included in the transcriptionreaction. In some embodiments, the cap analog will be incorporated atthe 5′ terminus by the enzyme guanylyl transferase. In some embodiments,the guanylyl transferase is a fusion protein. In some embodiments, theguanylyl transferase fusion protein formed when a guanylyl transferaseis covalently linked to a SUMO protein. In some embodiments, the capanalog will be incorporated only at the 5′ terminus because it does nothave a 5′ triphosphate. In some embodiments using a T7, T3 and SP6 RNApolymerase, the +1 nucleotide of their respective promoters is usually aG residue and if both GTP and m⁷GpppG are present in equalconcentrations in the transcription reaction, then they each have anequal chance of being incorporated at the +1 position. In someembodiments, m⁷GpppG is present in these reactions at several-foldhigher concentrations than the GTP to increase the chances that atranscript will have a 5′ cap. In some embodiments, a mMESSAGE mMACHINE®kit (Cat. #1344, Ambion, Inc.) is used according to manufacturer'sinstructions, where it is recommended that the cap to GTP ratio be 4:1(6 mM: 1.5 mM). In some embodiments, as the ratio of the cap analog toGTP increases in the reaction, the ratio of capped to uncapped RNAincreases proportionally. Considerations of capping efficiency must bebalanced with considerations of yield. Increasing the ratio of capanalog to GTP in the transcription reaction produces lower yields oftotal RNA because the concentration of GTP becomes limiting when holdingthe total concentration of cap and GTP constant. Thus, the final RNAyield is dependent on GTP concentration, which is necessary for theelongation of the transcript. The other nucleotides (ATP, CTP, UTP) arepresent in excess.

In particular embodiments, mRNA are synthesized by in vitrotranscription from a plasmid DNA template encoding a gene of choice. Insome embodiments, in vitro transcription includes addition of a 5′ capstructure, Cap1 (FIG. 1B), which has a 2′-O-methyl residue at the 2′ OHgroup of the ribose ring of base 1, by enzymatic conjugation of GTP viaa guanylyl transferase. In some embodiments, in vitro transcriptionincludes addition of a 5′ cap structure, Cap0 (FIG. 1A), which lacks the2′-O-methyl residue, by enzymatic conjugation of GTP via a guanylyltransferase. In some embodiments, in vitro transcription includesaddition of a 5′ cap of any of the cap structures disclosed herein byenzymatic conjugation of GTP via a guanylyl transferase.

Capping Efficiency

The present invention significantly increases capping efficiency. Insome embodiments, the use of a SUMO-GT fusion protein in an in vitrocapping assay results in at least about 70%, 75%, 80%, 85%, 90%, 95%,96%, 97%, 98%, or 99% capped mRNA. In some embodiments, the use of aSUMO-GT fusion protein in an in vitro capping assay results insubstantially 100% capped mRNA. In some embodiments, the use of aSUMO-GT fusion protein in an in vitro capping assay results in increaseof mRNA capping efficiency by at least about 20%, 30%, 40%, 50%, 60%,70%, 80%, 90%, 95%, 1-fold, 1.5-fold, 2-fold, 2.5-fold, 3-fold,3.5-fold, 4-fold, 4.5-fold, or 5-fold as compared to a control assayusing a non-fusion GT protein but under otherwise identical conditions.

In addition, the present invention permits large-scale production ofcapped mRNA with high efficiency. In some embodiments, capped mRNA isproduced at a scale of or greater than 1 gram, 5 grams, 10 grams, 15grams, 20 grams, 25 grams, 30 grams, 35 grams, 40 grams, 45 grams, 50grams, 75 grams, 100 grams, 150 grams, 200 grams, 250 grams, 300 grams,350 grams, 400 grams, 450 grams, 500 grams, 550 grams, 600 grams, 650grams, 700 grams, 750 grams, 800 grams, 850 grams, 900 grams, 950 grams,1 kg, 2.5 kg, 5 kg, 7.5 kg, 10 kg, 25 kg, 50 kg, 75 kg, or 100 kg perbatch.

Methods of estimating capping efficiency are known in the art. Forexample, the T7 RNA polymerase can be incubated with a cap dinucleotide,all four ribonucleotide triphosphates, [α-³²P]GTP, and a short DNAtemplate in which G is the first ribonucleotide specified after thepromoter (see Grudzien, E. et al. “Novel cap analogs for in vitrosynthesis of mRNA with high translation efficiency”, RNA, 10: 1479-1487(2004)). Any nucleotide on the 5′ side of a G residue acquires a³²P-labeled 3′-phosphate group after RNase T2 digestion bynearest-neighbor transfer. Anion exchange chromatography is then used toresolve labeled nucleoside 3′-monophosphates, resulting from internalpositions in the RNA, from 5′-terminal products. 5′-terminal productsare of two types. Uncapped RNAs yield labeled guanosine 5′-triphosphate3′-monophosphate (p3Gp*; in which * indicates the labeled phosphategroup). Capped RNAs yield various 5′-terminal structures, depending onthe nature of the cap analog used (m⁷Gp3Gp* and Gp3 m⁷Gp* when the capanalog is m⁷Gp3G).

Improved methods of directly quantitating mRNA capping efficiency in asample (e.g., a representative aliquot sample from an in vitro synthesisreaction) are provided in WO 2014/152673, which is incorporated hereinby reference. Some embodiments comprise the use of a cap specificbinding substance under conditions that permit the formation of acomplex between the cap specific binding substance and the capped mRNA.The formation of a complex between the cap specific binding substanceand the capped mRNA allows quantitative determination of the amount ofthe complex (i.e., capped mRNAs) relative to a positive control ofcapped products or negative control of uncapped products. In otherwords, binding indicates the amount of capped mRNA targets in the sampleand the capping efficiency in a reaction from which the sample isderived. Thus, in some embodiments, the step of quantitativelydetermining the amount of the complex comprises performing an ELISA-type assay wherein the cap specific binding substance is an antibodyor other protein that specifically binds an mRNA cap. Complex formationcan be quantified by addition of a detection agent specific for the capspecific binding substance (e.g., a goat anti-mouse antibody that bindsa mouse anti-m⁷G antibody) and which produces a signal directlyproportional to the amount of capped mRNA. Embodiments of the inventionmay be used to quantify the capping efficiency of a wide variety of RNAspecies, including in vitro transcribed mRNA, isolated eukaryotic mRNA,and viral RNA.

Additional improved methods of directly quantitating mRNA cappingefficiency in a sample (e.g., a representative aliquot sample from an invitro synthesis reaction) are provided in WO 2014/152659, which isincorporated herein by reference. Some embodiments of the inventioncomprise chromatographic methods of quantitating mRNA cappingefficiency. These methods are based in part on the insights that theversatility of enzymatic manipulation can be used to increase theresolution of chromatographic separation of polynucleotides. Thus, byamplifying the power of chromatographic separation through enzymaticmanipulation, embodiments of the invention increase the efficiency,quality and throughput of quantitation. For example, not only can thechromatographic methods described herein quantitate capping efficiency,they can also provide information on the modification of the cap (e.g.,methylation status at particular cap positions). Thus, embodiments ofthe invention can simultaneously quantitate capping efficiency and theefficiency of cap modification (e.g., methlylation efficiency). Thisquantification provides important characterization of an mRNA drugproduct that has significant impact on the protein production.

The invention will be more fully understood by reference to thefollowing examples. They should not, however, be construed as limitingthe scope of the invention. All literature citations are incorporated byreference.

EXAMPLES Example 1: SUMO-GT Construct Design

A new construct incorporating a small ubiquitin-like modifier (SUMO) tagcovalently linked and co-expressed with the large subunit faction of aguanylyl transferase (GT) heterodimer was synthesized.

Small ubiquitin-like modifier (SUMO) DNA: (SEQ ID NO: 5)GAAGAGAAACCGAAAGAGGGCGTTAAGACCGAGAATGACCACATTAACCTGAAGGTCGCTGGTCAAGATGGCAGCGTGGTGCAGTTTAAGATCAAGCGTCACACGCCGTTGAGCAAGCTGATGAAGGCTTACTGCGAGCGTCAGGGTCTGAGCATGCGTCAGATCCGCTTTCGTTTCGATGGCCAGCCGATCAATGAGACTGACACCCCAGCGCAACTGGGuanylyl transferase (GT) large subunit DNA: (SEQ ID NO: 2)AGATGGAAGATGAAGATACCATCGACGTCTTTCAGCAACAGACCGGTGGTATGGATGCTAACGTCGTTAGCAGCAGCACCATTGCGACTTACATTGATGCACTGGCCAAAAACGCATCTGAGCTTGAGCAGCGCAGCACCGCCTACGAGATCAATAACGAATTGGAGCTGGTTTTCATTAAACCGCCGCTGATCACGCTGACGAACGTCGTGAACATTAGCACGATTCAAGAGAGCTTTATTCGTTTCACCGTTACCAATAAAGAAGGCGTGAAGATCCGTACCAAGATTCCGCTGAGCAAAGTGCATGGTCTGGACGTGAAAAATGTGCAGCTGGTTGATGCGATCGATAACATCGTGTGGGAGAAGAAATCTTTGGTCACGGAAAATCGTCTGCACAAGGAATGTCTGCTGCGTCTGTCAACCGAAGAACGCCACATCTTCCTGGACTACAAGAAGTATGGTTCCAGCATCCGTCTGGAACTGGTGAACCTGATTCAGGCAAAGACCAAGAACTTCACCATTGACTTCAAACTGAAGTATTTCCTGGGCTCTGGTGCACAGAGCAAATCCAGCTTGTTGCACGCGATTAACCATCCGAAGAGCCGTCCGAATACGAGCCTGGAGATCGAATTCACGCCGCGTGATAACGAAACCGTTCCGTACGATGAGCTGATTAAAGAACTGACGACGTTGAGCCGCCACATCTTTATGGCCAGCCCGGAAAACGTGATCCTTAGCCCGCCTATCAATGCGCCGATTAAAACCTTTATGTTACCGAAACAAGACATTGTGGGTCTGGACCTGGAAAACCTGTACGCGGTCACCAAAACGGACGGCATTCCGATCACGATTCGTGTTACCAGCAATGGTCTGTACTGCTATTTCACTCATTTGGGCTATATCATTCGTTATCCGGTGAAACGCATCATTGATTCTGAGGTTGTCGTTTTCGGCGAAGCAGTCAAGGACAAGAATTGGACTGTGTACCTGATCAAATTGATTGAACCGGTTAACGCCATCAATGACCGCCTGGAAGAGTCGAAATATGTTGAAAGCAAACTGGTGGATATTTGTGATCGTATCGTGTTCAAGAGCAAGAAATATGAAGGCCCGTTCACCACGACCAGCGAAGTTGTTGACATGCTGAGCACCTATCTGCCGAAACAACCTGAGGGTGTGATTCTGTTTTACTCCAAGGGTCCGAAGAGCAACATTGATTTCAAAATCAAGAAAGAGAATACCATTGATCAGACCGCCAACGTTGTGTTCCGCTATATGTCCAGCGAGCCTATCATTTTCGGTGAGTCGAGCATCTTTGTTGAATACAAAAAGTTTAGCAACGATAAGGGTTTTCCGAAAGAATACGGTTCCGGTAAGATTGTGTTGTACAACGGCGTCAATTATCTGAACAACATCTACTGTCTGGAGTACATCAATACCCATAACGAAGTTGGCATTAAGTCTGTTGTCGTCCCGATCAAATTCATCGCGGAGTTCCTGGTTAACGGTGAGATTCTGAAGCCGCGTATTGATAAAACTATGAAATACATTAACTCCGAAGATTACTACGGTAATCAGCATAACATCATCGTCGAGCACTTGCGTGATCAAAGCATTAAGATCGGTGACATCTTTAACGAAGATAAGCTGAGCGATGTAGGCCACCAGTATGCGAACAATGACAAATTTCGCCTGAATCCGGAAGTCAGCTACTTTACGAATAAGCGCACCCGTGGTCCACTGGGTATCCTGAGCAATTATGTTAAAACCCTGTTGATTTCCATGTACTGCTCCAAAACGTTCCTGGACGACAGCAACAAGCGCAAAGTTCTGGCGATCGACTTCGGTAATGGTGCCGATCTGGAGAAGTACTTTTATGGTGAGATCGCATTGCTGGTTGCTACCGACCCGGATGCAGATGCGATCGCCCGTGGCAACGAGCGTTACAATAAGCTGAATAGCGGTATCAAGACCAAATACTACAAATTCGACTATATTCAAGAGACGATCCGCTCGGACACCTTTGTATCCAGCGTGCGTGAGGTGTTTTACTTCGGTAAATTCAACATCATTGACTGGCAATTCGCCATTCACTATAGCTTTCACCCACGCCACTATGCGACGGTCATGAACAACCTGTCTGAGCTGACCGCGAGCGGCGGTAAAGTTCTGATCACCACGATGGACGGTGACAAGCTGTCTAAACTGACCGACAAAAAGACCTTCATTATTCACAAAAATCTCCCGTCGAGCGAGAATTACATGTCCGTCGAAAAGATTGCGGACGACCGTATTGTTGTCTACAACCCGAGCACTATGTCGACCCCAATGACCGAGTATATCATCAAAAAGAATGACATTGTGCGTGTCTTTAATGAATACGGTTTTGTGCTGGTCGACAACGTCGATTTTGCGACCATCATCGAGAGAAGCAAGAAATTCATTAATGGCGCTTCTACGATGGAAGATCGCCCGAGCACGCGTAACTTCTTTGAGCTGAATCGTGGCGCGATTAAGTGCGAGGGCCTGGACGTCGAGGATCTGCTGTCGTATTACGTGGTTTATGTGTTTAGCAAACGTTAATGAGuanylyl transferase (GT) small subunit DNA: (SEQ ID NO: 3)ATGGACGAAATTGTCAAGAATATCCGTGAAGGTACCCACGTTTTACTGCCATTCTACGAGACGCTGCCGGAACTGAACCTGAGCCTGGGTAAAAGCCCTCTGCCGAGCCTGGAGTATGGTGCGAACTATTTTCTGCAGATTTCCCGTGTAAACGATTTGAACCGCATGCCGACGGACATGCTGAAACTGTTCACCCACGACATCATGCTGCCGGAATCTGATCTGGATAAAGTTTACGAGATCTTGAAAATCAATTCAGTGAAGTACTATGGCCGTAGCACCAAGGCCGATGCGGTGGTCGCAGACCTGAGCGCGCGTAACAAACTGTTTAAACGTGAACGTGACGCAATTAAGAGCAATAACCATCTGACCGAGAACAATTTGTACATCAGCGACTACAAGATGTTGACTTTTGACGTGTTTCGTCCGCTGTTCGACTTTGTTAATGAGAAATACTGCATTATCAAGCTGCCGACGTTGTTTGGTCGCGGCGTCATTGATACGATGCGCATTTACTGCTCTCTCTTCAAGAATGTGCGCCTGCTGAAGTGTGTCTCCGACAGCTGGCTGAAAGATAGCGCTATTATGGTTGCGAGCGACGTGTGTAAAAAGAACCTGGATCTGTTCATGAGCCACGTGAAGAGCGTTACCAAAAGCAGCAGCTGGAAAGACGTTAACAGCGTCCAGTTCTCCATTCTGAATAACCCGGTCGATACCGAGTTTATCAACAAGTTCCTTGAATTCAGCAATCGCGTTTATGAGGCCCTGTATTACGTTCATAGCCTGCTGTATAGCTCCATGACCTCTGATAGCAAATCGATCGAGAATAAACACCAACGTCGTCTGGTGAAAC TGCTGCTGTAATGASUMO-GT large subunit DNA construct with His tag and linker:(SEQ ID NO: 4) ATGGGCCATCATCATCACCATCACGGCAGCCTGCAAGAAGAGAAACCGAAAGAGGGCGTTAAGACCGAGAATGACCACATTAACCTGAAGGTCGCTGGTCAAGATGGCAGCGTGGTGCAGTTTAAGATCAAGCGTCACACGCCGTTGAGCAAGCTGATGAAGGCTTACTGCGAGCGTCAGGGTCTGAGCATGCGTCAGATCCGCTTTCGTTTCGATGGCCAGCCGATCAATGAGACTGACACCCCAGCGCAACTGGAGATGGAAGATGAAGATACCATCGACGTCTTTCAGCAACAGACCGGTGGTATGGATGCTAACGTCGTTAGCAGCAGCACCATTGCGACTTACATTGATGCACTGGCCAAAAACGCATCTGAGCTTGAGCAGCGCAGCACCGCCTACGAGATCAATAACGAATTGGAGCTGGTTTTCATTAAACCGCCGCTGATCACGCTGACGAACGTCGTGAACATTAGCACGATTCAAGAGAGCTTTATTCGTTTCACCGTTACCAATAAAGAAGGCGTGAAGATCCGTACCAAGATTCCGCTGAGCAAAGTGCATGGTCTGGACGTGAAAAATGTGCAGCTGGTTGATGCGATCGATAACATCGTGTGGGAGAAGAAATCTTTGGTCACGGAAAATCGTCTGCACAAGGAATGTCTGCTGCGTCTGTCAACCGAAGAACGCCACATCTTCCTGGACTACAAGAAGTATGGTTCCAGCATCCGTCTGGAACTGGTGAACCTGATTCAGGCAAAGACCAAGAACTTCACCATTGACTTCAAACTGAAGTATTTCCTGGGCTCTGGTGCACAGAGCAAATCCAGCTTGTTGCACGCGATTAACCATCCGAAGAGCCGTCCGAATACGAGCCTGGAGATCGAATTCACGCCGCGTGATAACGAAACCGTTCCGTACGATGAGCTGATTAAAGAACTGACGACGTTGAGCCGCCACATCTTTATGGCCAGCCCGGAAAACGTGATCCTTAGCCCGCCTATCAATGCGCCGATTAAAACCTTTATGTTACCGAAACAAGACATTGTGGGTCTGGACCTGGAAAACCTGTACGCGGTCACCAAAACGGACGGCATTCCGATCACGATTCGTGTTACCAGCAATGGTCTGTACTGCTATTTCACTCATTTGGGCTATATCATTCGTTATCCGGTGAAACGCATCATTGATTCTGAGGTTGTCGTTTTCGGCGAAGCAGTCAAGGACAAGAATTGGACTGTGTACCTGATCAAATTGATTGAACCGGTTAACGCCATCAATGACCGCCTGGAAGAGTCGAAATATGTTGAAAGCAAACTGGTGGATATTTGTGATCGTATCGTGTTCAAGAGCAAGAAATATGAAGGCCCGTTCACCACGACCAGCGAAGTTGTTGACATGCTGAGCACCTATCTGCCGAAACAACCTGAGGGTGTGATTCTGTTTTACTCCAAGGGTCCGAAGAGCAACATTGATTTCAAAATCAAGAAAGAGAATACCATTGATCAGACCGCCAACGTTGTGTTCCGCTATATGTCCAGCGAGCCTATCATTTTCGGTGAGTCGAGCATCTTTGTTGAATACAAAAAGTTTAGCAACGATAAGGGTTTTCCGAAAGAATACGGTTCCGGTAAGATTGTGTTGTACAACGGCGTCAATTATCTGAACAACATCTACTGTCTGGAGTACATCAATACCCATAACGAAGTTGGCATTAAGTCTGTTGTCGTCCCGATCAAATTCATCGCGGAGTTCCTGGTTAACGGTGAGATTCTGAAGCCGCGTATTGATAAAACTATGAAATACATTAACTCCGAAGATTACTACGGTAATCAGCATAACATCATCGTCGAGCACTTGCGTGATCAAAGCATTAAGATCGGTGACATCTTTAACGAAGATAAGCTGAGCGATGTAGGCCACCAGTATGCGAACAATGACAAATTTCGCCTGAATCCGGAAGTCAGCTACTTTACGAATAAGCGCACCCGTGGTCCACTGGGTATCCTGAGCAATTATGTTAAAACCCTGTTGATTTCCATGTACTGCTCCAAAACGTTCCTGGACGACAGCAACAAGCGCAAAGTTCTGGCGATCGACTTCGGTAATGGTGCCGATCTGGAGAAGTACTTTTATGGTGAGATCGCATTGCTGGTTGCTACCGACCCGGATGCAGATGCGATCGCCCGTGGCAACGAGCGTTACAATAAGCTGAATAGCGGTATCAAGACCAAATACTACAAATTCGACTATATTCAAGAGACGATCCGCTCGGACACCTTTGTATCCAGCGTGCGTGAGGTGTTTTACTTCGGTAAATTCAACATCATTGACTGGCAATTCGCCATTCACTATAGCTTTCACCCACGCCACTATGCGACGGTCATGAACAACCTGTCTGAGCTGACCGCGAGCGGCGGTAAAGTTCTGATCACCACGATGGACGGTGACAAGCTGTCTAAACTGACCGACAAAAAGACCTTCATTATTCACAAAAATCTCCCGTCGAGCGAGAATTACATGTCCGTCGAAAAGATTGCGGACGACCGTATTGTTGTCTACAACCCGAGCACTATGTCGACCCCAATGACCGAGTATATCATCAAAAAGAATGACATTGTGCGTGTCTTTAATGAATACGGTTTTGTGCTGGTCGACAACGTCGATTTTGCGACCATCATCGAGAGAAGCAAGAAATTCATTAATGGCGCTTCTACGATGGAAGATCGCCCGAGCACGCGTAACTTCTTTGAGCTGAATCGTGGCGCGATTAAGTGCGAGGGCCTGGACGTCGAGGATCTGCTGTCGTATTACGTGGTTTATGTGTTTAGCAA ACGTTAATGASmall ubiquitin-like modifier (SUMO) protein: (SEQ ID NO: 1)EEKPKEGVKTENDHINLKVAGQDGSVVQFKIKRHTPLSKLMKAYCERQGLSMRQIRFRFDGQPINETDTPAQLEMEDEDTIDVFQQQTGGGuanylyl transferase (GT) large subunit protein: (SEQ ID NO: 6)MDANVVSSSTIATYIDALAKNASELEQRSTAYEINNELELVFIKPPLITLTNVVNISTIQESFIRFTVTNKEGVKIRTKIPLSKVHGLDVKNVQLVDAIDNIVWEKKSLVTENRLHKECLLRLSTEERHIFLDYKKYGSSIRLELVNLIQAKTKNFTIDFKLKYFLGSGAQSKSSLLHAINHPKSRPNTSLEIEFTPRDNETVPYDELIKELTTLSRHIFMASPENVILSPPINAPIKTFMLPKQDIVGLDLENLYAVTKTDGIPITIRVTSNGLYCYFTHLGYIIRYPVKRIIDSEVVVFGEAVKDKNWTVYLIKLIEPVNAINDRLEESKYVESKLVDICDRIVFKSKKYEGPFTTTSEVVDMLSTYLPKQPEGVILFYSKGPKSNIDFKIKKENTIDQTANVVFRYMSSEPIIFGESSIFVEYKKFSNDKGFPKEYGSGKIVLYNGVNYLNNIYCLEYINTHNEVGIKSVVVPIKFIAEFLVNGEILKPRIDKTMKYINSEDYYGNQHNIIVEHLRDQSIKIGDIFNEDKLSDVGHQYANNDKFRLNPEVSYFTNKRTRGPLGILSNYVKTLLISMYCSKTFLDDSNKRKVLAIDFGNGADLEKYFYGEIALLVATDPDADAIARGNERYNKLNSGIKTKYYKFDYIQETIRSDTFVSSVREVFYFGKFNIIDWQFAIHYSFHPRHYATVMNNLSELTASGGKVLITTMDGDKLSKLTDKKTFIIHKNLPSSENYMSVEKIADDRIVVYNPSTMSTPMTEYIIKKNDIVRVFNEYGFVLVDNVDFATIIERSKKFINGASTMEDRPSTRNFFELNRGAIKCEGLDVEDLLSYYVVYVFSKRGuanylyl transferase (GT) small subunit protein: (SEQ ID NO: 7)MDEIVKNIREGTHVLLPFYETLPELNLSLGKSPLPSLEYGANYFLQISRVNDLNRMPTDMLKLFTHDIMLPESDLDKVYEILKINSVKYYGRSTKADAVVADLSARNKLFKRERDAIKSNNHLTENNLYISDYKMLTFDVFRPLFDFVNEKYCIIKLPTLFGRGVIDTMRIYCSLFKNVRLLKCVSDSWLKDSAIMVASDVCKKNLDLFMSHVKSVTKSSSWKDVNSVQFSILNNPVDTEFINKFLEFSNRVYEALYYVHSLLYSSMTSDSKSIENKHQRRLVKLLLSUMO-GT large subunit protein with His tag and linker: (SEQ ID NO: 8)MGHHHHHHGSLQEEKPKEGVKTENDHINLKVAGQDGSVVQFKIKRHTPLSKLMKAYCERQGLSMRQIRFRFDGQPINETDTPAQLEMEDEDTIDVFQQQTGGMDANVVSSSTIATYIDALAKNASELEQRSTAYEINNELELVFIKPPLITLTNVVNISTIQESFIRFTVTNKEGVKIRTKIPLSKVHGLDVKNVQLVDAIDNIVWEKKSLVTENRLHKECLLRLSTEERHIFLDYKKYGSSIRLELVNLIQAKTKNFTIDFKLKYFLGSGAQSKSSLLHAINHPKSRPNTSLEIEFTPRDNETVPYDELIKELTTLSRHIFMASPENVILSPPINAPIKTFMLPKQDIVGLDLENLYAVTKTDGIPITIRVTSNGLYCYFTHLGYIIRYPVKRIIDSEVVVFGEAVKDKNWTVYLIKLIEPVNAINDRLEESKYVESKLVDICDRIVFKSKKYEGPFTTTSEVVDMLSTYLPKQPEGVILFYSKGPKSNIDFKIKKENTIDQTANVVFRYMSSEPIIFGESSIFVEYKKFSNDKGFPKEYGSGKIVLYNGVNYLNNIYCLEYINTHNEVGIKSVVVPIKFIAEFLVNGEILKPRIDKTMKYINSEDYYGNQHNIIVEHLRDQSIKIGDIFNEDKLSDVGHQYANNDKFRLNPEVSYFTNKRTRGPLGILSNYVKTLLISMYCSKTFLDDSNKRKVLAIDFGNGADLEKYFYGEIALLVATDPDADAIARGNERYNKLNSGIKTKYYKFDYIQETIRSDTFVSSVREVFYFGKFNIIDWQFAIHYSFHPRHYATVMNNLSELTASGGKVLITTMDGDKLSKLTDKKTFIIHKNLPSSENYMSVEKIADDRIVVYNPSTMSTPMTEYIIKKNDIVRVFNEYGFVLVDNVDFATIIERSKKFINGASTMEDRPSTRNFFELNRGAIKCEGLDVEDLLSYYVVYVFSKR

Example 2: Production of SUMO-GT Protein

Shake Flask

Production of SUMO-GT fusion protein can be performed according tostandard methods and procedures. For example, to test and compareexpression of the GT and SUMO-GT fusion proteins, a single colony of theE. coli Rosetta strain (Novagen) containing each of the SUMO-eGFPplasmids was inoculated into 5 ml of Luria-Bertani (LB) media containing100 μg/ml Kanamycin and 30 μg/ml chloramphenicol. This strain is derivedfrom the lambda DE3 lysogen strain and carries a chromosomal copy of theIPTG-inducible T7 RNA polymerase along with tRNAs on a pACYC-basedplasmid. The cells were grown at 37° C. overnight with shaking at 250rpm. The next morning the overnight culture was transferred into 100 mlfresh medium to permit exponential growth. When the OD600 value reached−0.6-0.7, protein expression was induced by addition of 1 mM IPTG(isopropropyl-β-D-thiogalactopyranoside), followed by prolongedcultivation at either 37° C. for 3 hours or 20° C. overnight (about 15hours).

After the E. coli cells were harvested from LB medium (100 ml) bycentrifugation (8,000×g for 10 min at 4° C.), the cell pellets weresuspended in 6 ml of lysis buffer (PBS containing 300 mM NaCl, 10 mMimidazole, 0.1% Triton XlOO and 1 mM PMSF, pH 8.0). The cells were lysedby sonication (at 50% output for 5×30 second pulses). The sonication wasconducted with the tube jacketed in wet ice and 1 min intervals betweenthe pulse cycles to prevent heating. After the lysates were incubatedwith DNase and RNase (each at 40 μg/ml) for 15 min to digest nucleicacids, they were centrifuged at 20,000 g for 30 min at 4° C., and thesupernatant (soluble protein fractions) was collected. The pellets waswashed once with 6 ml of the lysis buffer to further extract the solublefraction; the wash (6 ml) was combined with previous extract (6 ml) tomake final volume of 12 ml for the soluble protein sample.

Insoluble protein samples were prepared from E. coli inclusion bodies.Briefly, after the extract containing soluble proteins were removed, thepellets containing inclusion bodies were suspended in the denaturingsolubilization buffer (Novagen) that contained 50 mM CAPS (pH 11.0),0.3% N-laurylsarcosine, and 1 mM DTT and incubated for 20 min at roomtemperature with shaking. The extract (insoluble protein fraction) wasobtained by high-speed centrifugation (80,000×g for 20 min at 4° C.).

For detection of expressed proteins using SDS-PAGE, 5 μl of the samplesprepared above were mixed with 3 μl of SDSPAGE sample buffer containingSDS and β-mercaptoethanol and were heated at 95° C. for 5 min tofacilitate denaturation and reduction of proteins. Proteins werevisualized using 15% SDS-polyacrylamide gels with Tris-Glycine runningbuffer and Coomassie blue staining.

Fermentation

The substantial increase in the solubility of the final SUMO-GTcomplexed enzyme was also reproduced by fermentation. Fermentation wasperformed according to standard methods and procedures. For example,fermentation methods for production of SUMO-GT fusion protein comprisedcell lysis, Immobilized Metal Affinity Chromatography (IMAC), CationExchange Chromatography, Anion Exchange Chromatography, and TangentialFlow Filtration (TFF) formulation. Quality testing of the SUMO-GT fusionprotein that resulted from fermentation comprised Reducing SDS PAGE todetermine purity and identity, Reverse-Phase HPLC to determine purity,A280 measurement of concentration and Limulus amebocyte lysate (LAL)assay to test for endotoxin.

As shown in FIG. 2, the yield of soluble SUMO-GT protein produced byfermentation is comparable to that of GT protein produced via the shakeflask method.

EQUIVALENTS

Those skilled in the art will recognize, or be able to ascertain usingno more than routine experimentation, many equivalents to the specificembodiments of the invention described herein. The scope of the presentinvention is not intended to be limited to the above Description, butrather is as set forth in the following claims:

1.-6. (canceled)
 7. A fusion protein, wherein the fusion proteincomprises guanylyl transferase and a small ubiquitin-like molecule(SUMO) protein comprising 90% identity to SEQ ID NO: 8 and SEQ ID NO: 7.8. The fusion protein of claim 7, wherein the guanylyl transferasecomprises SEQ ID NO: 8 and SEQ ID NO:
 7. 9.-10. (canceled)
 11. Thefusion protein of claim 7, wherein the fusion protein has comparablephosphatase activity, guanylyl transferase activity and methylationactivity relative to a wild-type guanylyl transferase protein.
 12. Avector encoding a fusion protein comprising guanylyl transferase proteinand a small ubiquitin-like molecule (SUMO) protein.
 13. The vector ofclaim 12, wherein the vector comprises SEQ ID NO: 1 and SEQ ID NO: 2.14. The vector of claim 12, wherein the vector comprises SEQ ID NO: 1,SEQ ID NO: 2, and SEQ ID NO:
 3. 15. The vector of claim 12, wherein thevector comprises SEQ ID NO: 4 and SEQ ID NO:
 3. 16. A method to producea guanylyl transferase by fermentation, comprising: a) culturing in afermentation medium a microorganism that is transformed with at leastone recombinant nucleic acid molecule comprising a nucleic acid sequenceencoding a guanylyl transferase fusion protein that has an amino acidsequence that is at least 90% identical to SEQ ID NO: 8 and SEQ ID NO:7; and b) collecting a product produced from the step of culturing. 17.(canceled)
 18. The method of claim 16, wherein the guanylyl transferasefusion protein has comparable phosphatase activity, guanylyl transferaseactivity and methylation activity relative to a wild-type guanylyltransferase protein.
 19. The method of claim 16, wherein the guanylyltransferase fusion protein comprises a small ubiquitin-like molecule(SUMO) protein.
 20. The method of claim 19, wherein the guanylyltransferase fusion protein comprises SEQ ID NO:
 8. 21. The method ofclaim 19, wherein the SUMO protein is bound to the guanylyl transferaseby a covalent link.
 22. The method of claim 21, wherein the covalentlink is between the SUMO protein and a large subunit of the guanylyltransferase.
 23. The method of claim 16, wherein the fermentation mediumis selected from the group consisting of Terrific Broth, Cinnabar, 2×YTand LB.
 24. The method of claim 16, wherein the microorganism is abacterium.
 25. The method of claim 16, wherein the nucleic acid sequenceencoding the guanylyl transferase is at least 90% identical to SEQ IDNO: 2 and SEQ ID NO:
 3. 26. The method of claim 16, wherein therecombinant nucleic acid molecule further comprises a nucleic acidsequence encoding a small ubiquitin-like molecule (SUMO) protein. 27.The method of claim 26, wherein the nucleic acid sequence encoding asmall ubiquitin-like molecule (SUMO) protein is at least 90% identicalto SEQ ID NO:
 1. 28. The method of claim 16, wherein the product is aguanylyl transferase.
 29. The method of claim 28, wherein the product isa guanylyl transferase fusion protein that further comprises a smallubiquitin-like molecule (SUMO) protein.
 30. (canceled)